Learning Graph Walk Based Similarity Measures for Parsed Text

نویسندگان

  • Einat Minkov
  • William W. Cohen
چکیده

We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning, can be used to derive a task-specific word similarity measure in this graph. We also propose a new path-constrained graph walk method, in which the graph walk process is guided by high-level knowledge about meaningful edge sequences (paths). Empirical evaluation on the task of named entity coordinate term extraction shows that this framework is preferable to vector-based models for smallsized corpora. It is also shown that the pathconstrained graph walk algorithm yields both performance and scalability gains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Based Similarity Measures for Synonym Extraction from Parsed Text

We learn graph-based similarity measures for the task of extracting word synonyms from a corpus of parsed text. A constrained graph walk variant that has been successfully applied in the past in similar settings is shown to outperform a state-of-the-art syntactic vectorbased approach on this task. Further, we show that learning specialized similarity measures for different word types is advanta...

متن کامل

Adaptive Graph Walk Based Similarity Measures in Entity-Relation Graphs

Relational or semi-structured data is naturally represented by a graph schema, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous in the sense that they describe different types of objects and multiple types of links. For example, email data can be described in a graph that includes messages, persons, dates and other objects;...

متن کامل

Natural Language Engineering

We consider a dependency-parsed text corpus as an instance of a labeled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning that model local and global information about the graph walk process, can be used to derive a task-specific word similarit...

متن کامل

A Graphical Framework For Contextual Search And Name Disambiguation In Email

Similarity measures for text have historically been an important tool for solving information retrieval problems. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a struct...

متن کامل

Learning to Walk Structured Text Networks

We propose representing a text corpus as a labeled directed graph, where nodes represent words and weighted edges represent the syntactic relations between them, as derived by dependency parsing. Given this graph, we adopt a graph-based similarity measure based on random walks to derive a similarity measure between words, and also use supervised learning to improve the derived similarity measur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008